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ENGINEERED CHIMERA OP PROTEIN FRAGMENTS AND METHODS OF USE 

THEREOF 

BACKGROUND OF THE INVENTION 

Technical Field 

The subject invention encompasses novel proteins 
related to the human immunodeficiency virus (HIV-1) gp41 
protein and to methods of use thereof. For example, the 
proteins may be utilized in the screening of anti-HIV 
compounds . 

Background Information 

The causative agent for Acquired Immune Deficiency 
Syndrome (A.I.D.S.) is an enveloped virus called human 
immunodeficiency virus type 1 (HIV-1) . The viral envelope 
contains a protein complex that is vital for viral entry 
into susceptible cells. This envelope complex specifies 
which cells to infect and mediates the fusion of the viral 
membrane with the host plasma membrane allowing the 
invasion of the viral genome. It is also responsible for 
syncytium formation that occurs when an infected cell fuses 
with a neighboring uninfected cell. 

The envelope complex is composed of two viral 
proteins, gpi20 and gp41. These two proteins come from a 
common precursor termed gpl60, which is cleaved by a 
cellular convertase to generate gp!20 and gp41 (Fields, B., 
Virology 1996, Lippincott -Raven Publishers, Philadelphia, 
pp. 1881-1952). The cleaved products remain noncovalently 
associated with each other in an oligomeric form 
(gpl20/gp41) on the surface of the virion. Binding of 
gpl20/gp41 to a host receptor, CD4, in conjunction with 
either one of two chemokine co-receptors, termed CCR-5 and 
CXCR-4, causes a conformational change in the viral 
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gpl2 0/gp41 complex where gp41 mediates the fusion of the 
viral membrane with the host membrane. 

The gp41 protein is a transmembrane protein with a 
multifaceted ectodomain. At its N- terminus resides a 
hydrophobic fusion peptide, which is crucial for membrane 
fusion. Next are two 4,3 hydrophobic (heptad) repeats which 
are involved in the formation of coiled-coils . The N- 
terminal heptad repeat is termed N-helix and the C-terminal 
heptad repeat is called C-helix. A loop region is present 
between these two helices. The heptad regions form a 
helical trimer of antiparallel dimmers (Lu et al . 1995 Nat. 
Struct. Biol 2 r pp. 1075-1082). 

The crystal structure of gp41 demonstrates that the 
gp41 core is a six-helix bundle formed by three molecules 
of gp41 in which the N and C helices are arranged into 
three hairpins (Chan et al. 1997 PNAS 94, 14036-14313; 
Weissenhorn et al . 1997 Nature 387, 426-430). The inner N- 
helices form the trimeric-coiled coil with the C-helices 
packing in an antiparallel manner into three, well- 
conserved hydrophobic grove along the outside of the coiled 
coil. In this state, gp41 is in the fusion-active 
conformation by bringing the viral and host membranes in 
close juxtaposition, overcoming the energy barrier for 
membrane fusion (Chan et al 1997 PNAS 94, 14036-14313; 
Furuta et al . 1998 Nat. Struct. Biol. 5, 276-279; Hughson 
1997 Curr. Biol. 7, R565-R569; Weissnhorn et al . 1997 
Nature 387, 426-430) . 

Prior to binding to the host cell, gp41 is said to be 
in a nonfusogenic state but, upon binding to CD4/DDR-5 or 
CD4/CXCR-4, a conformational change occurs in gp41 
transforming it into the fusogenic state. The hydrophobic 
N-terminal peptide of gp41 inserts into the host membrane. 
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In this situation, gp41 resides in both the viral membrane 
and the host membrane. This transient molecular species is 
called the prehairpin intermediate (Munoz-Barroso et al . 
1998 J. Cell Biol. 140, 315-323; Chan and Kim 1998 Cell 93, 
681-684; Furuta et al . 1998 Nat. Struct. Biol. 5, 276-279; 
Jones et al. 1998 J. Biol. Chem. 273, 404-409). 
Association of the C-helix with the N-helix into a trimer 
of helices represents the fusion active state of gp41. 
Synthetic peptides representing the C-helix are found to be 
inhibitors of HIV infection and syncytia formation at 
nanomolar concentrations in cell culture experiments (Jiang 
et al. 1993 Nature 365, 113; Wild et al . 1994 P.N.A.S. 91, 
12676-12680; Lu et al . Nat. Struct. Biol. 2, 1075-1082; 
Chan et al . 1998 P.N.A.S. 95, 15613-15617; Rimsky et al . 
1998 J. Virol. 72, 986-993) . These peptides bind to the 
transiently exposed N-helix coiled coil in the prehairpin 
intermediate. One of these peptides, T20, has shown to 
exhibit antiviral activity in humans (Kirby et al . 1998 
r.':.t. Med. 4, 1302-13 07) showing the utility of this 
treatment strategy. 

The crystal structure of the N-terminal coiled coil 
trimer reveals three symmetrically located pockets on its 
surface which bind a conserved motif of Trp-Trp-Ile found 
in the C-peptide. Peptide mimics of the Trp-Trp-Ile motif 
(Eckert et al . 1999 Cell 99, 103-115; see also WO/06599) 
show anti-fusion and antiviral activity. It is therefore 
desirable to discover small organic molecules which can 
bind the Trp-Trp-Ile binding pocket and which may serve as 
leads in drug discovery. The proteins of the present 
invention provide for such discovery. 

Computational methods to discover such compounds have 
been disclosed (Debnath et al. 1999 J. Med. Chem. 42, 3203- 
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3209) . This computation is derived from studies from 
protein with occupied Trp-Trp-Ile pockets. However, 
proteins with unoccupied Trp-Trp-Ile pocket would be more 
useful for drug discovery efforts. One such protein 
containing only 17 residues of the HIV-1 gp41 N-helix fused 
to a 29 -residue yeast GCN4 protein analog has been 
described (Eckert et al : 1999 Cell 99, 103-115). The 
proteins described in the present invention differ 
significantly from this yeast construct in that residues 
from both N- and C-peptide portions HIV-1 gp41 are used and 
present an unoccupied Trp-Trp-Ile pocket useful for drug 
discovery. Furthermore, the Eckert et al . protein has an 
exposed N-helical region of GCN4 that may engender 
unanticipated and/or undesirable binding properties (see 
also WO 00/55377) . The N-helical portion of the proteins 
described herein, in contrast, have only their Trp-Trp-Ile 
pocket exposed, with the remainder of the N-helix protected 
by an attached C-hellx region. Therefore, the proteins of 
the present, invention will not have any unanticipated' 
and/or undesirable binding properties that might be 
observed in connection with the Eckert et al . protein (see 
also WO 00/55377) . 

Additionally, there are several reasons why the pocket 
(i.e., Trp-Trp-Ile) is important for targeting drugs. 
First, mutagenesis studies show N-peptide residues forming 
the pocket are critical for membrane fusion (Dubay et al . 
1992 J. Virol. 66, 4748-4756; Cao et al . 1993 J. Virol. 72, 
2747-2755; Chen et al . 1993 J. Virol. 67, 3615-3619; Wild 
et al. 1994 P.N.A.S. 91, 12676-12680; Weng and Weiss 1998 
J. Virol. 72, 9676-9682) . Second, C-34 peptide variants 
show that the C-3 4 inhibitory activity depends on its 
ability to bind to the pocket (Chan et al . 1998 P.N.A.S. 
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95, 15613-15617). Third, it may be difficult for HIV-1 to 
become resistant to drugs that target this pocket because 
the residues comprising this pocket are highly conserved; 
the segment of mRNA encoding these residues is part of the 
5 Rev-response element (Malim et at . 1989 Nature 338, 254- 
257; Zapp and Green 1989 Nature 342, 714-716); and C- 
peptides lacking pocket binding residues are more 
vulnerable to the emergence of resistant virus than C- 
peptides that contain pocket -binding residues (Rimsky et 

10 al. 1998 J. Virol. 72, 986-993). Small molecules that bind 
to the hydrophobic pocket of the gp41 core might be 
expected to function as inhibitors in the same dominant 
negative manner as C-peptides (Chan et al . Cell 89, 263- 
273). Thus, a strong need exists for the understanding of 

15 the structure and sequence of the pocket so one may screen 
for pharmaceutical compounds which have the ability to bind 
to the pocket thereby acting as antagonists to the virus. 

All U.S. patents and publications referred to herein 
are hereby incorporated in their entirety by reference. 

20 

ffTTMMARY OF THE INVENTION 
The present invention is directed to novel proteins 
related to the HIV-1 gp41 protein which are useful for 
screening compounds for anti-HIV activity. In particular, 
25 the present invention includes an isolated nucleotide 

sequence selected from the group consisting of SEQ ID NO-.l, 
SEQ ID NO: 2, SEQ ID NO : 3 , SEQ ID NO : 4 , and a nucleotide 
sequence having at least 65% identity to a sequence 
selected from the group consisting of SEQ ID N0:1, SEQ ID 
30 NO: 2, SEQ ID NO : 3 and SEQ ID NO: 4. 

The present invention also includes a purified 
polypeptide encoded by a nucleotide sequence selected from 
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the group consisting of SEQ ID NO:l, SEQ ID NO: 2, SEQ ID 
NO:3, SEQ. ID NO:4, and a nucleotide sequence having at 
least 65% identity to a sequence selected from the group 
consisting of SEQ ID NO:l, SEQ ID NO: 2, SEQ ID NO: 3 and SEQ 
ID NO:4. 

Additionally, the present invention encompasses a 
purified polypeptide having an amino acid sequence selected 
from the group consisting of SEQ ID NO: 5, SEQ ID NO:6, SEQ 
ID NO: 7, SEQ ID NO : 8 , and an amino acid sequence having at 
least 65% similarity, preferably at least 75% similarity, 
and more preferably at least 90% similarity to a sequence 
selected from the group consisting of SEQ ID NO: 5, SEQ ID 
NO: 6, SEQ ID NO: 7 and SEQ ID NO: 8. 

Further, the present invention also includes a vector 
comprising one of the above nucleotide sequences as well as 
a host cell comprising the vector. 

The present invention also, encompasses a method of 
producing a protein having an unoccupied Trp-Trp-Ile pocket 
comprising the steps of: a) isolating a nucleotide sequence 
selected from the group consisting of SEQ ID NO:l, SEQ ID 
NO: 2, SEQ ID NO : 3 , SEQ ID NO: 4, and a nucleotide sequence 
having at least 65% identity to a sequence selected from 
the group consisting of SEQ ID NO:l, SEQ ID NO:2, SEQ ID 
N0:3 and SEQ ID NO : 4 ; b) constructing a vector comprising 
1) said nucleotide sequence of step (a) linked to 2) a 
promoter in an operable manner; and c) transforming a host 
cell with said vector of step (b) under time and conditions 
suitable for expression of the protein. 

Additionally, the present invention includes a method 
of detecting a compound which binds to gp41 protein 
comprising the steps of: a) contacting the compound of 
interest with a polypeptide having an amino acid sequence 
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selected from the group consisting of SEQ ID NO: 5, SEQ ID 
NO: 6, SEQ ID NO: 7, SEQ ID NO: 8, and an amino acid sequence 
having at least 65% similarity to a sequence selected from 
the group consisting of SEQ ID NO: 5, SEQ ID NO: 6, SEQ ID 
5 NO: 7 and SEQ ID NO: 8, for a time and under conditions 
sufficient for the formation of compound/polypeptide 
complexes; and b) detecting presence of the complexes, 
wherein detection indicates presence of a compound which 
binds to gp41 protein. The polypeptide of step (a) may be 

10 attached to a solid phase prior to performing step (a) . 

Furthermore, the solid phase may be, for example, a porous 
or non-porous material, a latex particle, a magnetic 
particle, a microparticle, a bead, a membrane, a microtiter 
well or a plastic tube. 

15 The present invention also includes a method of 

detecting a compound which binds to gp41 protein comprising 
the steps of: a) adding an indicator reagent capable of 
generating a measurable signal to a polypeptide having an 
amino acid sequence selected from the group consisting of 

20 SEQ ID NO:5, SEQ ID NO : 6 , SEQ ID NO : 7 , SEQ ID NO:8, and an 
amino acid sequence having at least 65% similarity to a 
sequence selected from the group consisting of SEQ ID NO: 5, 
SEQ ID NO: 6, SEQ ID NO : 7 and SEQ ID NO: 8, for a time and 
under conditions sufficient for the formation of indicator 

25 reagent /polypeptide complexes; b) contacting said indicator 
reagent/polypeptide complexes with said compound, for a 
time and under conditions sufficient for the formation of 
indicator reagent /polypeptide/compound complexes; and 
c) detecting a measurable signal generated by the indicator 

30 reagent, the measurable signal indicating presence of a 
compound which binds to gp41 protein. The indicator 
reagent may be, for example, an enzyme such as horseradish 
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peroxidase, beta-galactosidase or alkaline phosphatase, a 
luminescent compound, a radioactive element, a visual label 
or a chemiluminescent compound. 

Moreover, the present invention also includes an 
5 antibody produced in response to or directed against any of 
the polypeptides described above. The antibody may be 
either monoclonal or polyclonal . 

The present invention also encompasses a method for 
producing antibodies to gp41 protein comprising 
10 administering to a mammal a polypeptide having an amino 

acid sequence selected from the group consisting of SEQ ID 
NO: 5, SEQ ID NO : 6 , SEQ ID NO : 7 , SEQ ID NO : 8 , and an amino 
acid sequence having at least 65% similarity to a sequence 
selected from the group consisting of SEQ ID NO: 5, SEQ ID 
5 NO: 6, SEQ ID NO : 7 and SEQ ID NO: 8, in an amount sufficient 
to produce an immune response. 

Additionally, the invention also includes a vaccine 
for treatment of human immunodeficiency virus type 1 (or 
Autoimmune Immunodef iciency Syndrome caused" thereby) 
comprising an antibody noted above and a pharmaceutically 
acceptable excipient . 

Also, the present invention encompasses a method of 
detecting compounds which bind to gp41 protein from a 
mixture of compounds having unknown binding properties 
comprising the steps of: a) contacting at least one 
polypeptide having an amino acid sequence selected from the 
group consisting of SEQ ID NO: 5, SEQ ID NO: 6, SEQ ID NO: 7, 
SEQ ID NO: 8, and an amino acid sequence having at least 65% 
similarity to a sequence selected from the group cons SEQ 
ID NO: 5, SEQ ID NO : 6 , SEQ ID NO : 7 and SEQ ID NO: 9, with 
said compound mixture for a time and under conditions 
sufficient for the formation of polypeptide/compound 
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complexes; b) passing said mixture through a means having 
pores which allow only certain sized molecular weigh 
molecules to pass through; and c) detecting retained 
polypeptide/compound complexes which did not pass through 

5 the pores, wherein compounds present in the complexes bind 
to gp41 protein. The means may be, for example, a filter. 

Additionally, the present invention includes a method 
of detecting compounds which bind to gp41 protein from a 
mixture of compounds having unknown binding properties 

10 comprising the steps of: a) contacting at least one 

polypeptide having an amino acid sequence selected from the 
group consisting of . SEQ ID N0:5, SEQ ID N0:6, SEQ ID N0;7, 
SEQ ID NO: 8, and an amino acid sequence having at least 65% 
similarity to a sequence selected from the group consisting 

15 of SEQ ID N0:5, SEQ ID N0:6, SEQ ID NO : 7 and SEQ ID N0:9, 

with said compound mixture, for a time and under conditions 
sufficient for the formation of polypeptide/compound 
complexes; b) passing the mixture through a means that will 
pass molecules or complexes of a larger molecular size 

20 faster than those of smaller molecular size; and c) 

detecting the separated polypeptide/compound complexes 
which passed through the means more quickly than the 
smaller molecules or complexes, wherein compounds present 
in the complexes which passed through said means at the 

25 faster rate, bind to gp41 protein. This means may be, for 
example, a size exclusion resin. (See Dunayevskiy et al . , 
Rapid Communications in Mass Spectrometry , 1 1:1 178-1184 (1997) and Kaur 
et al., Journal of Protein Chemistry , 1 6(5):505-5 11 (1 997).) 

30 BRIEF DESCRIPTION OF THE DRAWINGS 

Figure 1 illustrates the manner in which the present 
protein was constructed (i.e., annealing reaction, fill in, 
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ligation and subsequent PCR reaction) . The nucleotide 
sequences of the oligos are shown in the annealing 
reaction. Each oligo hydrogen bonded with complementary 
sequences of another oligo (e.g., gp4l Antisense #2 with 
gp41 Sense #l and gp41 Antisense #3) as the temperature of 
the reaction decreased from 95 °C to 25 °C. 

Figure 2 represents the isolated nucleotide sequences 
of the proteins of the present invention. In particular, 
the nucleotide sequences of the gp41 clones 1 (SEQ ID 
N0:1), la (SEQ ID NO:2), lb (SEQ ID NO : 3 ) and lc (SEQ ID 
NO: 4) are shown. The linker codons for each clone, 
(GGGG) R* , (GDG)R, (GSG) P, and (GSNDG) R (SEQ ID NOS;18 
through 21, respectively) are boldfaced. *The R residue is 
present on the natural gp41 sequence. It was changed to a 
P residue in the clone IB sequence. 

Figure 3 represents a sequence alignment of pNL4-3 
(SEQ ID NO:17, GenBank Accession No. M19921) , clone 1 and 
the consensus HIV gp41 (strain HXB2) amino acid sequences. 
The wild-type sequence pNL4-3 was rearranged in the context 
of the re-engineered protein (i.e., C-Helix (linker) N-Helix) 
for purposes of this alignment. 

Figure 4 represents the amino acid sequences of the 
proteins of the present invention (i.e., Clone #1 = SEQ ID 
NO: 5, Clone #la (GDG) R = SEQ ID NO : 6 , Clone #lb(GDG)P = SEQ 
ID N0:7, Clone #lc (GSNDG) R = SEQ ID NO:8). 

Figure 5 represents the 1-D a H-NMR spectrum of clone lb 
and selective shifting of resonances upon the addition of 
Trp-Arg-Trp-Arg-Ile pentapeptide (SEQ ID NO: 28) . 

Figure 6 shows the crystal structure of Trimethyl Lead 
Acetate as soaked into the crystals of clone IB. 

Figure 7 represents the results of a sedimentation 
equilibrium study of the gp41 clone 4 construct. 
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Figure 8 illustrates the binding of cyclic D-peptides 
to gp41 examined by centrifugal enhanced affinity 
selection . 

Figure 9 represents the nucleotide sequence of clone 4 
5 (SEQ ID NO: 9) . The four Gly linker codons (SEQ ID NO: 22) 
are shown in boldface. 

Figure 10 represents the nucleotide sequences of gp41 
clones 4 (SEQ ID NO:9), 4a (SEQ ID NO:10), 4b (SEQ ID 
NO:ll), and 4c (SEQ ID NO:12). The linker codons are 
10 boldfaced. *The R residue is present on the natural gp41 
sequence. It was changed to a P residue in the clone 4B 
sequence . 

Figure 11 illustrates the amino acid sequences of 
clones 4 (SEQ ID NO:13), 4a (SEQ ID NO:14), 4b (SEQ ID 
15 NO:15), and 4c (SEQ IDNO:16). The sequence designated as 
MGHHHHHHHSSGHIDDDDK (SEQ ID NO: 27) represents a His tag/E . 
K. Cleavage site. 

DETAILED DESCRIPTION- OF THE INVENTION 
20 The subject invention relates to proteins which may be 

utilized to screen for anti -human immunodeficiency (HIV) 
compounds. Such compounds may, in turn, be used to treat 
or prevent HIV-af f licted or susceptible individuals. 

The isolated nucleotide sequences which encode the 
25 amino acid sequences of the proteins of the present 

invention are shown in Figure 2. The present invention 
encompasses not only these nucleotide sequence but 
fragments thereof, complements of the full sequences 
thereof, and fragments of the complements. Additionally, 
30 the invention includes sequences corresponding to (i.e., 

having identity to) or complementary to at least about 6 5%, 
preferably at least about 75%, and more preferably at least 



WO 02/34909 



PC T/USO 1/48040 



12 

about 90% of the nucleotide sequences shown in Figure 2 . 
Furthermore, the present invention includes fragments of 
the full sequences, complements of these sequences, as well 
as fragments of the complements. 

For purposes of the present invention, a "fragment" of 
a nucleotide sequence is defined as a contiguous sequence 
of approximately at least 120, preferably at least about 
140, more preferably at least, about 160 nucleotides, and 
even more preferably at least about 200 nucleotides 
corresponding to a region of the specified nucleotide 
sequence. 

Furthermore, for purposes of the present invention, a 
^complement" is defined as a sequence which pairs to a 
given sequence based upon base-pairing rules. For example, 
a sequence A-G-T in one nucleotide strand is "comple- 
mentary" to T-C-A in the other strand. 

Sequence identity or percent identity is the number of 
exact matches, between two aligned sequences divided by the 
length of the shorter sequence and multiplied by 100. An 
approximate alignment for nucleic acid sequences is provided 
by the local homology algorithm of Smith and Waterman, 
Advances in Applied Mathematics 2:482-489 (1981). This 
algorithm may be extended to use with peptide or protein 
sequences using the scoring matrix created by Dayhoff , Atlas 
of Protein Sequences and Structure , M.O. Dayhoff ed., 5 
suppl. 3:353-358, National Biomedical Research Foundation, 
Washington, D.C., USA, and normalized by Gribsko v, Nucl . 
Acids Res. 14 (6) :6745-66763 (1986). An implementation of 
this algorithm for nucleic acid and peptide sequences is 
provided by the Genetics Computer Group (Madison, WI) in the 
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BestFit utility application. In particular, the default 
parameters of Gap Creation Penalty of 8 and Gap Extension 
Penalty of 2, as described in the Wisconsin Sequence 
Analysis Package Program Manual, Version 8 (1995) (available 
5 from Genetics Computer Group, Madison, WI) , were used Other 
equally suitable programs for calculating the percent 
identity or similarity between sequences are generally known 
in the art . 

Sequences related to the nucleotide sequence of the 

10 present invention may be derived from non-human sources 
(e.g., bacterial, viral, mammalian, etc.) and are also 
covered by the present invention. Functional equivalents of 
the above -sequences (i.e., sequences having the ability to 
conform to the secondary and tertiary structure of the 

15 pocket) are also encompassed by the present invention and 
hybridize to the present nucleotide sequences. 

A nucleic acid molecule is "hybridizable" to another 
nucleic acid molecule" when a single-stranded form of the 
nucleic acid molecule can anneal to the other nucleic acid 

20 molecule under the appropriate conditions of temperature and 
ionic strength (see Sambrook et al . , Molecular Cloning: A 
Laboratory Manual , Second Edition (1989) , Cold Spring Harbor 
Laboratory Press, Cold Spring Harbor, New York) . The 
conditions of temperature and ionic strength determine the 

25 "stringency" of the hybridization. "Hybridization" 
requires that two nucleic acid sequences contain 
complementary sequences. However, depending on the 
stringency of the hybridization, mismatches between bases 
may occur. The appropriate stringency for hybridizing 

30 nucleic acids depends on the length of the nucleic acids and 
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the degree of complementation. Such variables are well 
known in the art. More specifically, the greater the degree 
of similarity or homology between two nucleotide sequences, 
the greater the value of Tm for hybrids of nucleic acids 
having those sequences. For hybrids of greater than 100 
nucleotides in length, equations for calculating Tm have 
been derived (see Sambrook et al . , supra ) , For 
hybridization with shorter nucleic acids, the position of 
mismatches becomes more important, and the length of the 
oligonucleotide determines its specificity (see Sambrook et 
al . , supra ) . 

Additionally, the present invention encompasses the 
purified polypeptides or proteins encoded by the nucleotide 
sequences illustrated in Figure 2. The amino acid sequences 
of the proteins are shown in Figure 4. The invention also 
includes those peptides, polypeptides or proteins, or 
fragments thereof, having an amino acid sequence that has at 
least about 65% amino acid similarity, preferably at least 
about 75% amino acid similarity, and more preferably at 
least about 90% amino acid similarity to the amino acid 
sequences resulting from the translation of the nucleotide 
sequences present in Figure 2 . For purposes of the present 
invention, ^similarity" is defined as the exact amino acid 
to amino acid comparison of two or more polypeptides at the 
appropriate place, where amino acids are identical or 
possess similar chemical and/or physical properties such as 
charge or hydrophobicity . ^Percent similarity" is 
calculated between the compared polypeptide sequences using 
programs known in the art (see above) . 
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It should also be noted that the clones described 
herein have an initiating Met codon (ATG) engineered at the 
5' end of the gene. Gp41 proteins purified from E. coli 
harboring these cDNA clones have the N-terminal initiator 
methionine residue clipped off. This phenomenon occurs 
during the process of translation within the bacterial host. 
The formyl groups present on the initiating Met residue of 
polypeptides in prokaryotes (e.g., E. coli) are generally 
removed rapidly during biosynthesis by a 1 ' def ormylase " 
enzyme. In many case, the Met residue is also removed by a 
specific aminopeptidase. Consequently, not all mature 
proteins have Met as the amino- terminal residue, even though 
all are synthesized in this manner. 

in terms of usage of the proteins of the present 
invention, such purified proteins may be utilized for many 
purposes. For example, the proteins may be used to screen 
lor compositions which bind to the "Trp-Tzp-Ile" pocket of 
the proteins and would therefore consequently bind to the 
pocket of gp41. In particular, if one is able to identify 
compounds that bind to gp41, then one may prevent further 
replication of the virus and the resulting 

pathophysiological changes (e.g., wasting, dementia, liver 
involvement, etc.) associated with AIDS One may also 
prevent initial infection with the HIV Type I. 

The identification of compounds which bind to gp41, by 
use of the proteins of the present invention, may be 
carried out by the use of, for example, drug screening 
assays. Initially, however, one must produce the proteins 
for use in the assays. 
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Expression of the Proteins: 

In order to express the proteins of the present 
invention, a vector is first constructed comprising at 
least one of the isolated DNA sequences encoding the 
5 complete protein or proteins of interest. The vector may 
be, for example, a plasmid, a bacteriophage or a cosmid. 
The vector is then introduced into a eukaryotic or 
prokaryotic host cell (e.g., mammalian cells (e.g., 
Chincese Hamster Ovary (CHO) cells, yeast cells, insect 

10 cells and bacterial cells (e.g., E . coli ) ) under time and 
conditions suitable for production of the protein, by a 
method commonly known in the art (see, e.g. , Sambrook et 
al . , Molecular Cloning: A Laboratory Manual , 2 nd Edition, 
Cold Spring Harbor Laboratory Press (1989)). 

15 The protein may be isolated from the host cell, 

expressing the protein, according to procedures known in 
the art including, for example, ammonium sulfate 
precipitation, fractionation column chromatography (e.g., 
ion exchange, gel filtration, electrophoresis and -affinity 

20 chromatography) and ultimately by crystallization (see, 

e.g., "Enzyme Purification and Related Techniques", Methods 
in Enzymoloqy , 22, 233-577 (1971)). 

Of course, the proteins of the present invention, or 
portions thereof, may also be prepared by chemical 

25 synthesis by use of techniques well known in the art, such 
as by solid phase synthesis (Merrif ield, 1964, J . Am . Chem . 
Assoc . 85:2149-2154) or synthesis in homogenous solution 
(Houbenweyl, 1987, Methods of Organic Chemistry, ed. E. 
Wansch, Vol. 15 I and II, Thieme, Stuttgart) . 

30 

Drug Screening- Assays 
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Once the "final" proteins of interest are produced, 
drug screening assays may be carried out in order to 
identify antagonists of the present proteins and thus 
compounds which bind to gp41. One such assay involves 

5 adding a signal generating compound or label (e.g., 

chromogens, radioisotopes (e.g., 1251, 1311, 32P, 3H, 35 S 
and 34C) , fluorescent compounds (e.g., fluorescein and 
rhodamine) , chemiluminescent compounds, particles (visible 
or fluorescent), nucleic acids ,. complexing agents, or 

10 catalysts such as enzymes wherein addition of a chromo- , 
fluoro- or lumo-genic substrate results in generation of 
the detectable signal (e.g., alkaline phosphatase, acid 
phosphatase , horseradish peroxidase , beta-galactosidase , 
and ribonuclease) ) to the w f inal protein" comprising the 

15 linker and the unnatural domain. The labelled protein is 
then added to the composition of interest. Upon binding 
between the protein and the composition of interest, a 
signal is generated by the signal-generating compound, 
either visually or through instrumentation. Such a signal 

20 indicates that the composition of interest would bind to 
gp41 in vitro or in vivo. 

It should be noted that the protein may be added to a 
solid phase initially, if desired. Examples of solid 
phases include porous and non-porous materials, latex 

25 particles , magnetic particles , micropart icles , beads , 
membranes, microtiter wells and plastic tubes. 

Additionally, in order to identify compositions which 
bind to gp41, using the present proteins, one may utilize 
an Affinity-Selection method known to those of ordinary 

30 skill in the art (see, e.g., U.S. Patent No. 5,891,742; 
U.S. Patent No. 5,891,742; U.S. Patent No. 5,670,326). 
Briefly, one or more of the purified proteins of the 
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present invention is mixed with several test compounds of 
interest. The mixture is passed through a filter which 
will retain protein combined with bound compound and which 
allows unbound compound to pass through- A typical 
membrane of use is a 5000-10000 Dalton membrane. 
Compositions that bind to the protein (s) and, in 
particular, the Trp-Trp-Ile pocket of the protein(s), will 
be retained by the filter. The unbound compounds are not 
retained by the filter and can therefore be separated from 
the bound compositions. The retained compositions (i.e., 
those bound to the protein) may be utilized in preventing 
and treating, for example, AIDS. Alternatively, a sizing 
column with exclusion limit of 5000 Dalton or below, such 
as Sephadex G-2 5, can be employed. A third method for 
separating bound and unbound molecules is centrifugal 
enhanced affinity selection (e.g., spin screen (see U.S. 
Patent Appln. Ser. No. 09/270,427 incorporated in its 
entirety, by reference, herein) . 

"It should also be noted that the proteins' of the 
present invention may be used in drug screening assays or 
methods other than Affinity- Selection. For example 
CrystaLead technolgy may be utilized (see WO 99/45379) as 
well as SARbyNMR technology (see WO 98/48264) to identity 
pharmaceutical compounds having the ability to bind to 
gp41. Drug screening, by use of the proteins of the 
present invention, may also be accomplished using high- 
throughput screening (see Eckert et al . , WO 00/06599). 
Computational docking may also be used to screen for 
pharmaceuticals of interest (see Debnath et al . , 1999 J. 
Med. Chem. 42, 3203-3209 and Kuntz et al . , 1994 Acc . Chem. 
Res. 27 (5) , 117-123) . 
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Another method which may be utilised for screening 
compounds which bind to the proteins of the present 
invention and thus gp41 is termed -spin screen" (see, 
e.g., Holzman et al., Abstract, -Identification of Ligands 
5 by Spin Screen: A Lead Discovery and Refinement Process", 
presented at -Drug Discovery Technology" seminar, August 

16-19, 1999, Boston, MA). 

Once compounds have been identified which have the 
ability to bind to gp41, such compositions may be 

10 administered to patients having, for example, Autoimmune 

Deficiency Syndrome (AIDS) or a precursor stage thereof, or 
may be utilized in the prevention of the disease . 
Furthermore, any condition caused by a virus which 
comprises the gp41 protein, may also be treated by 

15 administration of compounds, identified by the proteins and 
methods of the present invention, which bind to gp41. 

The pharmaceutical composition may comprise a 
therapeutically effective amount of the active drug, 
discovered above, and an appropriate physiologically 

20 acceptable carrier (e.g., water, buffered water or saline). 
The dosage, form (e.g., suspension, tablet, capsule, etc.), 
and route of administration of the pharmaceutical 
composition (e.g., oral, topical, intravenous, 
subcutaneous, etc.) may be readily determined by a medical 

25 practitioner and may depend upon such factors as, for 
example, the patient's age, weight, immune status, and 
overall health. 



30 



Antibody Production 
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The present invention also encompasses antibodies 
(e.g., monoclonal or polyclonal) or portions thereof 
produced in response to the present proteins. (See U.S. 
Patent No. 4,196,265 for a discussion of the production of 
monoclonal antibodies; see also Kohler et al . , Nature , 
1975, 256:495-497. For a discussion of polyclonal antibody 
production, see, e.g., ANTIBODIES: A LABORATORY MANUAL , 
Cold Spring Harbor Laboratory, 1988) . These antibodies may 
be produced, for example, by injecting a mammal (e.g., a 
mouse or a goat) with the proteins described herein or a 
portion thereof (see, e.g., ANTIBODIES : A LABORATORY 
MANUAL , supra ; see also U.S. Patent No. 4,196,265). 
Furthermore, an adjuvant such as Freund's adjuvant may also 
be injected with the protein. An immune response will then 
be elicited and antibodies will be produced from the 
animal. Such antibodies may then be purified from the 
blood of the mammal . 

The anti- "protein" antibodies of the present invention 
have significant utility in immunoassays for the detection 
of gp41 due to their cross -reactivity therewith. Thus, 
such antibodies may be utilized for diagnosis as well as 
prognostic monitoring of HIV as indicated through the 
presence of gp41. Immunoassays included within the present 
invention encompass, but are not limited to, those 
described in U.S. Pat. No. 4,367, 110 (double monoclonal 
antibody sandwich assay) and U.S. Pat. No. 4,452,901 
(western blot) . Other assays include, for example, 
immunoprecipitation of labeled ligands and 

immunocytochemistry, both, in vitro and in vivo. Preferred 
assays are, for example, enzyme linked immunosorbent assays 
(ELISAs) and radioimmunoassays (RIA) , both of which are well 
known in the art. 
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Additionally, the present invention includes vaccines 
comprising the above -described antibodies or portions 
thereof . The antibodies may be administered to an 
individual (e.g., an individual recently infected with HIV, 

5 an individual having AIDS or a non- infected individual) 
with, for example, an appropriate carrier (e.g., water, 
buffered water or saline) . Subsequent to administration of 
the vaccine, the antibodies will bind to expressed gp41 in 
the body in order to form a complex, thereby preventing 

10 further replication of the virus or even initial 

replication of the virus in a patient recently "infected" 
such that symptoms would never manifest. 

The present invention may be illustrated by the use of 
the following non-limiting examples : 

15 

Example I 

Construction of the Nucleotide Sequence Corr esponding to the 

qp41 Gene 

20 The nucleotide sequence corresponding to the 

reengineered gp41 gene was synthesized as a series of four, 
overlapping oligonucleotides. The oligonucleotides had the 
following sequence: 



25 gp41 Sense #1 (SEQ ID NO: 23) 

5 ' -TACACAAGCTTGATCGACTCTCT^ ' 

gp41 Antisense #2 (SEQ ID NO: 24) 

5 ' - CCAGACAGAAGCTGACGACCACCACCACCGTCCAGTTCTAGAAGTTCCT ' 

30 

gp41 Sense #3 (SEQ ID NO: 25) 

5 ' - GGTCGT C AGCTT CTGTCTGGTATCGTT C AG C AG CAG AACAAT CTGCTGCGTG CTAT CG AAGCT CAG C AG CATC - 

35 gp41 Antisense #4 (SEQ ID NO:26) 
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5 ' -TTCAACAGCCAGGATACGAGCCTGAAGCTGTTTGATACCCCAAACGGTCAGTTGCAGCAGATGCTGCTGAGCT 
TCGATAGCACG-3' . 

5 These oligonucleotides were phosphorylated with T4 

polycucleotide kinase, annealed, and used as a template for 
T4 DNA polymerase to fill in the gaps (Fig. 1) . 
Specifically, ten pmols of each primer was kinased with 
bacteriophage T4 polynucleotide Kinase as per the 
10 instructions of the manufacturer, Life Technologies 

(Rockville, MD) . One pmol of each kinased primer was mixed 
together in a 20ul annealing reaction. The final buffer 
components were 20 mM Tris-HCl pH 7.4, 2 mM MgCl 2 , 50 mM 
NaCl. Eppendorf tubes containing these reactions were 
15 sealed and inverted in a 2 liter beaker of H 2 0 heated to 100 
degrees C. These reactions remained in the one liter beaker 
until the water had reached room temperature. After this 
the DNA was precipitated with ethanol,' washed with 70% 
ethanol and dried. The annealed DNA was dissolved in T4 
20 Polymerase buffer as per the specifications of the 

manufacturer (Life Technologies, Rockville, MD) . T4 DNA 
polymerase was added to the reaction to fill in the single- 
stranded gaps after the reaction was placed at 3 7 degrees C 
for 1 hour. T4 DNA ligase (Life Technologies, Rockville, 
25 MD) was then added to seal the nicks that occurred within 
the template. 

After ligation with T4 DNA Ligase, the resulting DNA 
molecule was used as a template for PCR to amplify the 
reengineered sequence for cloning into an appropriate 
30 expression vector (e.g., pET 19b or pET 21a). Positive 

clones were sequenced followed by protein expression and 
purification studies. Bacteria, BL21 [DE3] were induced to 
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express gp41 by the addition of Isopropyl B-D- 
Thiogalactopyranoside (IPTG) to a final concentration of 
0.1 mM in a 1 liter LB culture which had an O.D. 60 o reading 
of 0.6. The addition of IPTG induced the expression of the 
T7 RNA polymerase gene which is integrated in the bacterial 
chromosomal DNA. This DNA- dependent RNA polymerase then 
specifically recognized the T7 promoter sequence on the 
recombinant plasmid DNA (e.g., pET21a and pET19b Novagen, 
Madison, WI) and drove the transcription of the gp41 mRNA. 
This mRNA was then translated to a high level by the E. 
coli ribosomes . 

After expression for two hours at 3 7 degrees C, the 
bacteria were harvested by centrif ugation, washed with 40 
mM Tris-HCl, 100 mM NaCl buffer, collected by 
centrifugation again and the cell pellet was stored -at -80 
degrees C. Prior to the addition of IPTG and at the time of 
harvest, 1 mL aliquot s were taken out for analysis on SDS- 
PAGE to see if there was a protein expressed of the 
expected molecular weight . Bacterial pellets demonstrating 
expression of gp41 were lysed by passage through a French 
pressure cell. This lysate was centrifuged at 10, 000 x g to 
separate the insoluble material (P10) from the soluble 
material (S10) . Ammonium Sulfate was added to the S10 to a 
final saturation of 35%. The precipitated protein from this 
treatment was collected at 12,000 rpm. The protein pellet 
was dissolved in 5 mis of Lysis Buffer (20 mM Tris-HCl pH 
8.0, 100 mM NaCl, 1 mM EDTA, and 5% glycerol) and dialyzed 
against Lysis Buffer for two, 1 liter buffer changes. This 
protein was then loaded on a Uno Q6 column (Cation 
Exchanger, BioRad, Hercules, CA) and eluted by a 0 . 1 M NaCl 
to 1.0 M NaCl linear gradient. Peak fractions were pooled 
and dialyzed against 2 0 mM MES pH 6.0, 10 0 mM NaCl, 1 mM 
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EDTA, and 5% glycerol. Precipitated protein was collected 
by centrifugation and dissolved in 10 mM Bis Tris pH 7.7. 



Example II 

Linkers Used to Produce Pocket Formation 
Linkers of varying length were engineered into the DNA 
sequence encoding gp41 (see Fig. 2 for nucleotide and 4 
peptide sequences) . In particular, visual inspection of 
10 the proximity of the two termini of the N- and C-helices 
indicated that peptides of 3, 4, and 5 amino acid length 
could span the required distance. 

The original clone (Clone 1) containing the 
nucleotides encoding the GGGG linker was designed in the 
15 initial oligo annealing fill in experiment (Fig. la) . Clone 
la, clone lb, and clone 1c were constructed by standard PCR 
where the sense primer was long enough (i.e. 117 nucleotide 
sense primers for clone la and clone lb and a 123 
_ nucleotide- sense primer for clone lc) to cover the linker 
20 region. The antisense primers were the same for each 
clone. After PCR, the amplified DNAs of each linker 
derivative were cut with restriction endonucleases for 
cloning into the appropriate expression vector (e.g. 
pET21a) . Plasmids positive for DNA inserts were sequenced 
to verify the changes in the linker sequenced. These 
plasmid DNAs were then used to transform E. coli BL21[DE3] 
for expression and purification as described above. 



25 



30 



Example III 

1-D 1 H- NMR Spectrum of Clone lb and Selective Shifting of 
Resonances Upon Addition of Pentapeptide 
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Nuclear magnetic resonance (NMR) spectroscopy is a 
proven technique for the detection of ligand binding to 
proteins. Application of this technique for screening, or 
structure activity relationships (SAR) by NMR (Shuker, et 

5 al. f Science 274:1531-1534 (1996)) has been successful to 
identify drug leads against several proteins (WO 97/18471, 
published May 22, 1997 and WO 97/18469, published May 22, 
1997, both of which are incorporated herein by reference). 
This technique relies on detecting chemical shifts of amide 

10 proton and nitrogen atoms resulting from changes in the 

chemical environment of the peptide backbone, such as those 
that occur upon ligand binding. 

Based on the technique's sensitivity, experiments were 
designed to evaluate the binding of the peptide with the 

15 sequence WRWRI to clone IB of GP41. The protein encoded by 
GP41 clone lb was isotopically labeled with 1S N- ammonium 
chloride in E. coli and purified. The protein was 
concentrated to 0.4 mM in 10 mM sodium phosphate buffer, pH 
7.9 in H : 0/D 2 O (9:1) . Amide proton and nitrogen chemical 

20 shift measurements were obtained from 15 N heteronuclear 

single quantum coherence experiments at 45 degrees C on a 
Broker DRX5 00 NMR spectrometer (Broker Instruments, 
Karlsruhe, Germany) . To detect binding of the WRWRI 
peptide to GP41, the NMR experiment was repeated in the 

25 presence of 0.5 mM of the WRWRI peptide using the same 

buffer conditions described above. The 15 N heteronuclear 
single quantum coherence spectra for the gp41 protein, xn 
the presence and absence of this peptide, are shown in 
Figure 5. Both chemical shift changes and line broadening 

30 in the presence of the WRWRI peptide indicate that binding 
of the two molecules had occurred. 
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Example IV 
Crystalline Structure of Clone lb 

The gp41 clone IB protein was expressed and purified 
as described above. The protein was concentrated to 
approximately 7 mg/ml in 20 mM Hepes buffer at pH 7.5. In 
this solution, the protein was stable for several months at 
4 degrees C. The initial crystals were identified from 
Hampton Crystal Screen I (Hampton Research, 2 5431 Cabot 
Rd., Suite 205, Laguna Hills, CA, 92653-5527) from 
condition number 23 (0.2 M magnesium chloride hexahydrate, 
0.1 M Hepes pH 7.5, 3 0% PEG 400) and from a variety of 
other related conditions. Crystal growth was optimized 
with hanging drop vapor diffusion at 17 degrees C under the 
conditions of 0.2 M magnesium chloride hexahydrate, 0.1 M 
T:is pH 8.5 and screening 33-40% PEG 400 at a protein 
concentration of 7 mg/ml. Crystals grow to an average size 
of 400 X 400 X 200 microns in approximately 1-2 weeks. 

The crystals could be frozen at 150 degrees Kelvin for 
data collection, using an oxford cryo-system by soaking the 
crystals for at least 1 hour in 23% PEG 400, 0.15 M 
magnesium chloride hexahydrate, 0.08 M Tris pH 8.5 and 10% 
glycerol. Under these conditions the crystals diffract to 
1.9 Angstrom resolution at a synchrotron radiation source. 
Native data to 2 . 8 Ang. were collected and the crystals 
belong to the space group P321, with the unit cell 
parameters a=b=96.9, c = 72.7, and alpha=beta=90 and gamma= 
120. The structure was solved by molecular replacement 
using a molecular model based on the coordinates derived 
from Protein Data Band entry 1ENV and using the molecular 
replacement program, AMORE (Collaborative Computational 
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Project, Number 4, 1994. "The CCP4 Suite: Programs for 
Protein Crystallography" . Acta Cryst . D50, 760-763.). The 
structures were refined using CNX (A. T. Brunger, "The Free 
R Value: a Novel Statistical Quantity for Assessing the 
Accuracy of Crystal Structures, Nature 355, 472-474 (1992)) 
using a combination of simulated annealing maximum 
likelihood refinement and individual B- factor refinement. 
Electron density maps were inspected on a Silicon Graphics 
INDIG02 workstation using the program package QUANTA 98 
(Molecular Simulations Inc., San Diego, CA) . 

The native structure contains four gp41 clone IB 
monomers (one monomer consists of one N-helix/C-helix pair) 
in the asymmetric unit. Three monomers make one native 
like gp41 trimer, the other monomer sits on the 
crystallographic three fold rotation axis which generates a 
crystallographic trimer. The native structure was refined 
to 2.8 Ang. resolution with an R= 2 8% and Rfree of 34%. 

Crystals of gp41 could then be soaked with 100 mM 
trimethyl lead acetate for 1-2 days in 36% PEG 400, 0.2 M 
magnesium chloride hexahydrate, 0.1 M bis-tris propane, to 
obtain a lead heavy atom derivative. In the Fo-Fc maps 
calculated after rigid-body refinement, seven trimethyl 
lead acetate sites could be identified. One of these sites 
was centrally located in the target site normally occupied 
by the Trp Trp lie of the native C-peptide. Thus, the 
above data indicates that compounds can be soaked into the 
binding site. 

Example V 

Sedimentation Equilibrium Study of crp41 Clone 4 Construct 
The gp41 clone 4 protein (see amino acid sequence 
below) was analyzed at 15K, 20K, 25K, and 30K rpm at 8 °C 
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in lOmM P0 4 , 12 0mM NaCl , 2 . 7mM KCl, pH 7.4. (See Figure 9 
for the nucleotide sequence of the clone.) Data from the 
different speeds were analyzed globally by a nonlinear 
least squares curve fitting of radial concentration 
profiles using the Marquardt-Levenberg algorithm as 
implemented in Origin 5.0 (Microcal Software, Inc., 
Northampton!, MA) . A user defined function describing 
sedimentation behavior of discrete particles was used (see 
equation 1 of Holzman et al., in "Modern Analytical 
Ultracentrif ugation" , Birkhauser, Boston, MA 1994, pp. 298- 
314) . Baselines and fixed radius signal values for each 
data set were allowed to vary independently; however, the 
molecular weight was held as a global parameter. The 
partial specific volume of the protein was calculated from 
the amino acid composition by the method of Cohen and 
Edsall as 0.729cm 3 /gm (Cohen and Edsall, in "Proteins, Amino 
Acids, and Peptides as Ions and Dipolar Ions", Rheinhold, 
New York, 1943, Chapter 4, page 157). Buffer density was 
measured at 8 °C at 1.01456 m/cm 3 in a Mettler-KEM Da-310 
density meter (Mettler, Highstown, New Jersey) . Closed 
circles (see Figure 7) are the absorbance data points. The 
fitted data, and the open circles are the residuals. 
Initial concentrations were 0.65 mg/ml . 

The molecular weight determined from this analysis was 
33,000 +/-280, in good agreement with the expected 
molecular weight for a trimer. This strongly indicates 
that a construct having a portion of the C-terminal helix 
of gp-41, followed by a linker, followed by the N- terminal 
helix, is capable of forming the expected quaternary 
structure in a manner analagous to the intact protein. 

Clone 4 amino acid sequence: 
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MGHHHHHHSSGHIDDDDKYTSLIHSLIEESQNQQEKNEQELLELDGGGGRQLLS 
GIVQQQNNLLRAIEAQQHLLQLTVWGIKQLQARILAVE 

Example VI 

Binding of Cylic D-Peptides to qp41 by Centrifugal Enhanced 

Affinity Selection 

The peptide (D10-P1-2K, amino acid sequence Ac- 
KKGACEARHEWAWLCAA-NH 2 , 244 piM) and protein (D10-P5-2K, Ac- 
KKGACELLGWEWAWLCAA-NH 2 , 200 fiM) were mixed in the centrifuge 
tube, vortexed, and spun for 2 hours at an average g- force 
of 309,880 (1000,000 rpm in a TLA-100 rotor, Beckman, 
Fullerton, CA) . Five fractions, 20 microliters each, were 
removed by carefully pipetting from under the meniscus and 
examined by reverse-phase high pressure liquid 
chromatography. The amount of peptide is followed by peak 
uvea of the relevant peaks on the chroma t ograms . The 
relevant peak area in each fraction was divided by the sum 
of the peak areas in all 5 fractions, and the result is 
plotted versus the additive volume of each fraction. 
(Figure 8A represents the binding of D10-P1-2K, and Figure 
6B represents the binding of D10-P5-2K. ) The initial 
concentrations of protein and peptide were 244 //and 200 fiM, 
respectively. A closed square indicates the amount of 
peptide in each fraction after centrif ugation in the 
presence of gp41 clone lb. Open circles indicate the 
amount of peptide in each fraction after centrif ugation in 
the absence of gp41. 

As evidenced by differential movement of the peptides 
in the presence of the protein, both peptides bind very 
well to gp41 clone lb. As evidenced by the lack of 
movement in the absence of the protein, both appear to be 
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quite soluble under the conditions examined. No difference 
in affinity for gp4i clone lb was observable from this 
assay. Recovery of peptides in the presence and absence of 
protein was good, indicating no loss of ligand during the 
experiment . 

Further, the above results indicate that the present 
gp41 construct has the ability to bind to a peptide which, 
in turn, has been shown to bind to a region of the N- 
terminal helix contained in the present construct. 
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CLAIMS : 

1. An isolated nucleotide sequence selected from 
the group consisting of SEQ ID NO:l, SEQ ID NO: 2, SEQ ID 
NO: 3, SEQ ID NO : 4 , and a nucleotide sequence having at 
least 65% identity to a sequence selected from the group 
consisting of SEQ ID NO:l, SEQ ID NO: 2, SEQ ID NO : 3 and SEQ 
ID NO: 4. 

2. A purified polypeptide encoded by a nucleotide 
sequence selected from the group consisting of SEQ ID NO:l, 
SEQ ID NO: 2, SEQ ID NO:3, SEQ ID NO : 4 , and a nucleotide 
sequence having at least 6 5% identity to a sequence 
selected from the group consisting of SEQ ID NO:l, SEQ ID 
NO: 2, SEQUENCE ID NO : 3 and SEQUENCE ID NO:4. 

3. A purified polypeptide having an amino acid 
sequence selected from the group consisting of SEQ ID NO: 5, 
SEQ ID NO: 6, SEQ ID NO: 7, SEQ ID NO: 8, and an amino acid 
sequence having at least 6 5% similarity to a sequence 
selected from the group consisting of SEQ ID NO: 5, SEQ ID 
NO : 6 , SEQ ID NO : 7 and SEQ ID NO : 8 . 

4. A vector comprising said isolated nucleotide 
sequence of claim 1 . 

5. A host cell comprising said vector of claim 6. 

6. A method of producing a protein having an 
unoccupied Trp-Trp-Ile pocket comprising the steps of: 
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a) isolating a nucleotide sequence selected from 
the group consisting of SEQ ID NO:l, SEQ ID 
NO: 2, SEQ ID NO : 3 , SEQ 10:4, and a nucleotide 
sequence having at least 65% identity to a 
sequence selected from the group consisting of 
SEQ ID NO:l, SEQ ID NO : 2 , SEQ ID NO : 3 and SEQ 
ID NO:4; 

b) constructing a vector comprising 1) said 
nucleotide sequence of step (a) linked to 2) a 
promoter in an operable manner; 

c) transforming a host cell with said vector of 
step (b) under time and conditions suitable 
for expression of said protein. 

7. A method of detecting a compound which binds 
to gp41 protein comprising the steps of: 

a) contacting said compound with a polypeptide 
having an amino acid sequence selected from 
the group consisting of SEQ ID NO: 5, SEQ ID 
NO: 6, SEQ ID NO : 7 , SEQ ID NO : 8 , and an amino 
acid sequence having at least 65% similarity 
to a sequence selected from the group 
consisting of SEQ ID NO: 5, SEQ ID NO : 6 , SEQ ID 
NO: 7 and SEQ ID NO : 8 , for a time and under 
conditions sufficient for the formation of 
compound/polypeptide complexes; and 
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b) detecting presence of said complexes, wherein 
detection indicates presence of a compound 
which binds to gp41 protein. 

5 8. An antibody directed against said polypeptide of 

claim 3 . 

9. A method of detecting a compound which binds to 
gp41 protein comprising the steps of: 

10 

a) adding an indicator reagent capable of 
generating a measurable signal to a polypeptide 
having an amino acid sequence selected from the 
group consisting of SEQ ID NO: 5, SEQ ID NO : 6 , 

15 SEQ ID NO: 7, SEQ ID NO : 8 , and an amino acid 

sequence having at least 65% similarity to a 
sequence selected from the group consisting of 
SEQ ID NO:5, SEQ ID NO:6,SEQ ID NO : 7 and SEQ ID 
NO: 8, for a time and under conditions 

20 sufficient for the formation of indicator 

reagent /polypeptide complexes ; 

b) contacting said indicator reagent /polypeptide 
complexes with said compound, for a time and 

25 under conditions sufficient for the formation 

of indicator reagent /polypeptide /compound 
complexes ; and 



30 



c) detecting a measurable signal generated by 

said indicator reagent, said measurable signal 
indicating presence of a compound which binds 
to gp41 protein. 
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10. A method of detecting compounds which bind, to 
gp41 protein from a mixture of compounds having unknown 
binding properties comprising the steps of: 

a) contacting at least one polypeptide having an 
amino acid sequence selected from the group 
consisting of SEQ ID NO: 5/ SEQ ID NO: 6, SEQ ID 
NO: 7, SEQ ID NO : 8 , and an amino acid sequence 
having at least 65% similarity to a sequence 
selected from the group consisting of SEQ ID 
NO: 5, SEQ ID NO : 6 , SEQ ID NO : 7 and SEQ ID NO : 9 , 
with said compound mixture for a time and under 
conditions sufficient for the formation of 
polypeptide/ compound complexes ; 

b) passing said mixture through a means having pores 
which allow only certain sized molecular weight 

- molecules to pass through"; and 

c) detecting retained polypeptide/compound complexes 
which did not pass through said pores, wherein 
compounds present in said complexes bind to gp41 
protein . 
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gp41 C-Helix (Gly)4 N-Helix Gene Construction 



gp41 Sense #1 



gp41 Sense #3 



gp41 Antisense #2 



gp41 Antisense #4 



1) Kinase oiigos 

2) Anneal oiigos 

3) Fill in gaps with T4 DNAp°>, gene 32 protein, dNTPs 

4) Ligate with T4 DNA Ligase 







IIIIIIIIIMIIIIIII 




Use this 222bp DNA fragment as a 
template for PCR amplification 




PCR Sense oligo 








lllllllllllllllllll 


Mill 


N 1 Illlllllllllll 




PCR Antisense oligo 



FIGURE 1 (1 OF 2) 
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Annealing of the gp41 oligos 



3 -TACAI^«=OTC^TCrJtCrrCTCrc^T 




gp41 Antisense#2 



#3 



y^ATCCTTaUCCA C CACAAC 



t OB C HeC MPC- 3 
3 ^(^^t^^Jt^^d^MM^I^ttiAidAllCGlTG^CVGCCMACCO^. 



gp41 Antisense#4 



FIGURE 1 (2 OF 2) 
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gp41 Clone #1 (GGGG)R* Linker 

5 • - ATG ACA AGC TTG ATC CAC TCT CTG ATC GAA GAA AGC CAG AAC 
CAG CAG GAA AAA AAC GAA CAG GAA CTT CTA GAA CTG GAC GGT GOT 
GGT GGT CGT CAG CTT CTG TCT GGT ATC GTT CAG CAG CAG AAC AAT 
CTG CTG CGT GCT ATC GAA GCT CAG CAG CAT CTG CTG CAA CTG ACC 
GTT TGG GGT ATC AAA CAG CTT CAG GCT CGT ATC CTG GCT GTT GAA 
-3' 

gp41 Clone #la (GDG)R Linker 

5' - ATG ACA AGC TTG ATC CAC TCT CTG ATC GAA GAA AGC CAG AAC 
CAG CAG GAA AAA AAC GAA CAG GAA CTT CTA GAA CTG GAC GOT GAC 
GGT CGT CAG CTT CTG TCT GGT ATC GTT CAG CAG CAG AAC AAT CTG 
CTG CGT GCT ATC GAA GCT CAG CAG CAT CTG CTG CAA CTG ACC GTT 
TGG GGT ATC AAA CAG CTT CAG GCT CGT ATC CTG GCT GTT GAA -3 

gp41 Clone #lb (GDG) P Linker 

5' - ATG ACA AGC TTG ATC CAC TCT CTG ATC GAA GAA AGC CAG AAC 
CAG CAG GAA AAA AAC GAA CAG GAA CTT CTA GAA CTG GAC GGT GAC 
GGT CCG CAG CTT CTG TCT GGT ATC GTT CAG CAG CAG AAC AAT CTG 
CTG CGT GCT ATC GAA GCT CAG CAG CAT CTG CTG CAA CTG ACC GTT 
TGG GGT ATC AAA CAG CTT CAG GCT CGT ATC CTG GCT GTT GAA -3 

gp41 Clone #lc (GSNDG) R Linker 

5< ATG ACA AGC TTG ATC CAC TCT CTG ATC GAA GAA AGC CAG AAC 
CAG CAG GAA AAA AAC GAA CAG GAA CTT CTA GAA CTG GAC GGT TCT 
AAC GAC GGT CGT CAG CTT CTG TCT GGT ATC GTT CAG CAG CAG AAC 
AAT Sg CTG CGT GCT ATC GAA GCT CAG CAG CAT CTG CTG CAA CTG 
ACC GTT TGG GGT ATC AAA CAG CTT CAG GCT CGT ATC CTG GCT GTT 
GAA -3' 



FIGURE 2 
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gp41 Sequence Alignment 

pNL4-3 YTSLIHSLIEESQNQQEKNEQELLELD/ / RQIiLSD rVQQQNNLLRAIEAQQHUiQLTVWGIKQLQARI LAVE 

ci m (GGGG) _ G _ 

Cons . — A- -YT-L • Q v 



FIGURE 3 
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Clone #1 '• 1^ 

/ r- . — 

MTS1 ,THSI IFFSqNqQFKNFQR I F* ryr.GGGT^LSGI VQTONMl^j^QQHIXQLTVWGl KQLQARILAV E 
| C-Helix ~| \ . N-Meiix \ 

Clone #la (GDG)R 

lVn-SLIHSLIEESQN(MEKNEQELLELD(GDG)RQLLSGIVQQQWIJLItAIEAQQHLLQLTVWGIKQLQARIL 
| C-Helix • ~1 1 M-" e " x 1 

Clone #lb (GDG)P 

MTSLMSlJEESQNQQEKNEQEIAEU)(GDGyQIXSGIVQ 

C-Helix | 1 N-Hdix | 

Clone #lc(GSNDG)R 

\n - SUHSLIEESQNQQEKNEQELLELD(GSNDG)RQI^GIVQQQNKLIJlAlEAQQHIXQLTVWGK 
| C-Helix ~| 1 N-tieiix | 



FIGURE 4 
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« 

o 



r> O 
to ^" 



0 Chemical shift 
change 





5j- 




Line broadening 



8- 



16.0 



1H PPM 



FIGURE 5 
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FIGURE 6 
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gp41 Clone #4 (GGGG) Linker 

5'- ATG GGC CAT CAT CAT CAT CAT CAT CAC AGC AGC GGC CAT ATC 
GAC GAC GAC GAC AAG TAC ACA AGC TTG ATC CAC TCT CTG ATC GAA 
GAA AGC CAG AAC CAG CAG GAA AAA AAC GAA CAG GAA CTT CTA GAA 
CTG GAC GGT GGT GGT GGT CGT CAG CTT CTG TCT GGT ATC GTT CAG 
CAG CAG AAC AAT CTG CTG CGT GCT ATC GAA GCT CAG CAG CAT CTG 
CTG CAA CTG ACC GTT TGG GGT ATC AAA CAG CTT CAG GCT CGT ATC 
CTG GCT GTT GAA -3' 



FIGURE 9 
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gp41 Clone #4 

5 ' - ATG GGC CAT CAT 

GAC GAC GAC GAC AAG 

GAA AGC CAG AAC CAG 

CTG GAC GGT GGT GGT 

CAG CAG AAC AAT CTG 

CTG CAA CTG ACC GTT 
CTG GCT GTT GAA -3' 

gp41 Clone #4a 

5 ' - ATG GGC CAT CAT 
GAC GAC GAC GAC AAG 
GAA AGC CAG AAC CAG 
CTG GAC GGT GAC GGT 
CAG AAC AAT CTG CTG 
CAA CTG ACC GTT TGG 
GCT GTT GAA -3' 

gp41 Clone #4b 

5 ' - ATG GGC CAT CAT 
GAC GAC GAC GAC AAG 
GAA AGC CAG AAC CAG 
CTG GAC GGT GAC GGT 
CAG AAC AAT CTG CTG 
CAA CTG ACC GTT TGG 
GCT GTT GAA -3' 

gp41 Clone #4c 

5 ' - ATG GGC CAT CAT 
GAC GAC GAC GAC AAG 
GAA AGC CAG AAC CAG 
CTG GAC GGT TCT AAC 
CAG CAG CAG AAC AAT 
CTG CTG CAA CTG ACC 
ATC CTG GCT GTT GAA 



(GGGG)R* Linker. 

CAT CAT CAT CAT CAC AGC AGC GGC CAT ATC 
TAC ACA AGC TTG ATC CAC TCT CTG ATC GAA 
CAG GAA AAA AAC GAA CAG GAA CTT CTA GAA 
GGT CGT CAG CTT CTG TCT GGT ATC GTT CAG 
CTG CGT GCT ATC GAA GCT CAG CAG CAT CTG 
TGG GGT ATC AAA CAG CTT CAG GCT CGT ATC 



(GDG)R Linker 

CAT CAT CAT CAT CAC AGC AGC GGC CAT ATC 
TAC ACA AGC TTG ATC CAC TCT CTG ATC GAA 
CAG GAA AAA AAC GAA CAG GAA CTT CTA GAA 
CGT CAG CTT CTG TCT GGT ATC GTT CAG CAG 
CGT GCT ATC GAA GCT CAG CAG CAT CTG CTG 
GGT ATC AAA CAG CTT CAG GCT CGT ATC CTG 



(GDG) P Linker 

CAT CAT CAT CAT CAC AGC AGC GGC CAT ATC 
TAC ACA AGC TTG ATC CAC TCT CTG ATC GAA 
CAG GAA AAA AAC GAA CAG GAA CTT CTA GAA 
CCG CAG CTT CTG TCT GGT ATC GTT CAG CAG_ 
CGT GCT ATC GAA GCT CAG CAG CAT CTG CTG 
GGT ATC AAA CAG CTT CAG GCT CGT ATC CTG 



(GSNDG)R Linker 

CAT CAT CAT CAT CAC AGC AGC GGC CAT ATC 
TAC ACA AGC TTG ATC CAC TCT CTG ATC GAA 
CAG GAA AAA AAC GAA CAG GAA CTT CTA GAA 
GAC GGT CGT CAG CTT CTG TCT GGT ATC GTT 
CTG CTG CGT GCT ATC GAA GCT CAG CAG CAT 
GTT TGG GGT ATC AAA CAG CTT CAG GCT CGT 
-3' 



FIGURE 10 
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Clone #4 

M(7HIs)SSGHIDD0K YTSUHSIJEESQNQQEKNEQEIi J ELD(GGGQ)RQIXSCHVQQQ>^^ 

M(7His)SSGHIDDDK I C-Helix | 1 N-Helix | 



E JK. Cleavage 
Clone #4a (GDG)R 

M( 7Hl S )SSGHIDDDK liTSUHSLIEESQNQ QEKNEQFI LEL D(GDG)RQLLSGIVQQ QNNLLRAIEAQQHLLQLTVWG1KQLQAR1LA VE 

M( 7His )SSGHIDDDK j c-Helix I 1 N-Helix ~| 

E.K. Cleavage . 
Clone #4b (GDG)P 
M( 7Hi s )SSGHIDDDK ytsuhslieesqnqqekneqeixeld(gdg)pqijl5g^qqqnnixr^ 

M(7His)SSGHIDDDK j C-Helix ~| 1 N-Helix | 

EJC Cleavage 
Clone #4c (GSNDG)R 

M(7Hl S jSSGHIDDDK YTSUHSLIEESONOOEKNEOELLELD(GSNDr})BQl T .sniyQQQMNT .1 » AIEAQQH1 * ra .TVwniKQi.QARH.AVE 

M(7Hi S )SSGHIDDDK ! C-Helix | 1 N-Helix | 

EJC Cleavage 



FIGURE 11 
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SEQUENCE LISTING 

<110> Abbott Laboratories 
Stewart, Kent D. 
Steffy, Kevin R. 
Kempf, Dale J. 
Harris, Kevin S. 
Huth, Jeffrey R. 
Stoll, Vincent S. 
Harlan, John E. 
Ng, Iok C. 
Betz, Stephen F. 

<120> ENGINEERED CHIMERA OF PROTEIN FRAGMENTS 
AND METHODS OF USE THEREOF 



<130> 6749. US. 01 

<140> 09/698,311 
<141> 2000-10-27 

<160> 28 

<170> FastSEQ for Windows Version 4.0 

<210> 1 
<211> 222 
<212> DNA 

<213> Artificial Sequence 



<220> 

<223> gp41 Clone #1 
<400> 1 

atgacaagct tgatccactc tctgatcgaa gaaagccaga accagcagga aaaaaacgaa 
caggaacttc tagaactgga cggtggtggt ggtcgtcagc ttctgtctgg tatcgttcag 
cagcagaaca atctgctgcg tgctatcgaa gctcagcagc atctgctgca actgaccgtt 
tggggtatca aacagcttca ggctcgtatc ctggctgttg aa 

<210> 2 
<211> 219 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> gp41 Clone #la 
<400> 2 

atgacaagct tgatccactc tctgatcgaa gaaagccaga accagcagga aaaaaacgaa 
caggaacttc tagaactgga cggtgacggt cgtcagcttc tgtctggtat cgttcagcag 
cagaacaatc tgctgcgtgc tatcgaagct cagcagcatc tgctgcaact gaccgtttgg 
ggtatcaaac agcttcaggc tcgtatcctg gctgttgaa 



60 
120 
180 

222 



60 
120 
180 
219 



<210> 3 
<211> 219 
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<212> DNA 

<213> Artificial Sequence 
<220> 

<223> gp41 Clone #lb 
<400> 3 

atgacaagct tgatccactc tctgatcgaa gaaagccaga accagcagga aaaaaacgaa 60 
caggaacttc tagaactgga cggtgacggt ccgcagcttc tgtctggtat cgttcagcag 120 
cagaacaatc tgctgcgtge tatcgaagct cagcagcatc tgctgcaact gaccgtttgg 180 
ggtatcaaac agcttcaggc tcgtatcctg gctgttgaa 219 

<210> 4 
<211> 225 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> gp41 Clone #lc 
<400> 4 

atgacaagct tgatccactc tctgatcgaa gaaagccaga accagcagga aaaaaacgaa 
caggaacttc tagaactgga cggttctaac gacggtcgtc agcttctgtc tggtatcgtt 
cagcagcaga acaatctgct gcgtgctatc gaagctcagc agcatctgct gcaactgacc 
gtttggggta tcaaacagct tcaggctcgt atcctggctg ttgaa 

<210> 5 
<211> 74 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> Chimera protein encoded by gp41 Clone #1 
<400> 5 

Met Thr Ser Leu lie His Ser Leu He Glu Glu Ser Gin Asn Gin Gin 

1 5 io 15 

Glu Lys Asn Glu Gin Glu Leu Leu Glu Leu Asp Gly Gly Gly Gly Arg 

20 25 30 

Gin Leu Leu Ser Gly He Val Gin Gin Gin Asn Asn Leu Leu Arg Ala 

35 40 45 

He Glu Ala Gin Gin His Leu Leu Gin Leu Thr Val Trp Gly He Lys 

50 55 60 

Gin Leu Gin Ala Arg He Leu Ala Val Glu 
65 70 

<210> 6 
<211> 73 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> Chimera protein encoded by gp41 Clone #la 
<400> 6 

Met Thr Ser Leu He His Ser Leu He Glu Glu Ser Gin Asn Gin Gin 
1 5 10 15 



60 
120 
180 
225 
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Glu Ly S Asn Glu Gin Glu Leu Leu Glu Leu Asp Gly Asp Gly Arg Gin 

Leu Leu Ser Gly lie Val Gin Gin Gin Asn Asn Leu Leu Arg Ala lie 

Glu Ala Gin Gin His Leu Leu Sn Leu Thr Val Trp Gly He Lys Gin 

50 55 ■ 60 

Leu Gin Ala Arg He Leu Ala Val Glu 
65 70 



<210> 7 
<211> 73 
<212> PRT 

<213> Artificial Sequence 
< ^ n 0> 

<223> Chimera protein encoded by gp41 Clone #lb 



<400> 7 

Met Thr Ser Leu 
1 

Glu Lys Asn Glu 

20 

Leu Leu Sor Gly 
35 

Glu Ala Gin Gin 
50 

Leu Gin Ala Arg 
65 



lie His Ser Leu 
5 

Gin Glu Leu Leu 

He Val Gin Gin 
40 

His Leu Leu Gin 
55 

He Leu Ala Val 
70 



He Glu Glu Ser 
10 

Glu Leu Asp Gly 

25 

Gin Asn Asn Leu 

Leu Thr Val Trp 
60 

Glu 



Gin Asn Gin Gin 
15 

Asp Gly Pro Gin 
30 

Leu Arg Ala He 
45 

Gly He Lys Gin 



<210> 8 
<211> 75 

<212> PRT 

<213> Artificial Sequence 



<22 0> 

<223> Chimera protein encoded by gp41 Clone #lc 



Me^Thr Ser Leu He His Ser Leu He Glu Glu Ser Gin Asn Gin Gin 



— — 10 15 

gIu Lys Asn Glu gL Glu Leu Leu Glu Leu Asp Gly Ser Asn Asp Gly 



Arg Gin Leu Leu Ser Gly He Val Gin Gin Gin Asn Asn Leu Leu Arg 

Ala lie Glu Ala Gin Gin His ieu Leu Gin Leu Thr Val Trp Gly He 

50 55 60 

Lys Gin Leu Gin Ala Arg He Leu Ala Val Glu 

65 70 75 



<210> 9 
<211> 279 
<212> DNA 

<213> Artificial Sequence 



<220> 

<223> gp41 Clone #4 
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<400> 9 

atgggccatc atcatcatca tcatcacagc agcggccata tcgacgacga cgacaagtac 60 

acaagcttga tccactctct gatcgaagaa agccagaacc agcaggaaaa aaacgaacag 120 

gaacttctag aactggacgg tggtggtggt cgtcagcttc tgtctggtat cgttcagcag 180 

cagaacaatc tgctgcgtgc tatcgaagct cagcagcatc tgctgcaact gaccgtttgg 240 

qgtatcaaac agcttcaggc tcgtatcctg gctgttgaa 279 



<210> 10 
<211> 276 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> gp41 Clone #4a 



<400> 10 

atgggccatc atcatcatca tcatcacagc agcggccata tcgacgacga cgacaagtac 60 

acaagcttga tccactctct gatcgaagaa agccagaacc agcaggaaaa aaacgaacag 120 

gaacttctag aactggacgg tgacggtcgt cagcttctgt ctggtatcgt tcagcagcag 180 

aacaatctgc tgcgtgctat cgaagctcag cagcatctgc tgcaactgac cgtttggggt 240 

:*r-caaacagc ttcaggctcg tatcctggct gttgaa ~ ^ ^ 276 



.i0> 11 

: ! : > 276 
: ; 2 > dna 

■ -. i"J> Artificial Sequence 



: : : > gp41 Clone #4b 



11 

■ * 'M^catc atcatcatca tcatcacagc agcggccata tcgacgacga cgacaagtac 60 

ivjcttga tccactctct gatcgaagaa agccagaacc agcaggaaaa aaacgaacag 120 

■ ; . ictitctag aactggacgg tgacggtccg cagcttctgt ctggtatcgt tcagcagcag 180 

■ i.i Mrftctgc tgcgtgctat cgaagctcag cagcatctgc tgcaactgac cgtttggggt 240 
.^j.uacagc ttcaggctcg tatcctggct gttgaa 276 



0> 12 
. 1 1> 282 

• :■ i ::> dna 

13> Artificial Sequence 
:113> gp41 Clone #4c 



■:.JT»0> 12 

ar.qggccatc atcatcatca tcatcacagc agcggccata tcgacgacga cgacaagtac 60 

acaagcttga tccactctct gatcgaagaa agccagaacc agcaggaaaa aaacgaacag 120 

gaacttctag aactggacgg ttctaacgac ggtcgtcagc ttctgtctgg tatcgttcag 180 

cagcagaaca atctgctgcg tgctatcgaa gctcagcagc atctgctgca actgaccgtt 240 

tggggtatca aacagcttca ggctcgtatc ctggctgttg aa ~ 282 



<210> 13 
<211> 93 
<212> PRT 

<213> Artificial Sequence 
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<220> 

<223> Chimera protein encoded by gp41 Clone #4 

<400> 13 
Met Gly H 
1 



His 


His 


His 


His 


His 


His 


Ser 


Ser 


Gly 


His 


He 


Asp 


Asp 


5 










10 










15 


Gin 


Tyr 


Thr 


Ser 


Leu 


He 


His 


Ser 


Leu 


He 


Glu 


Glu 


Ser 


20 










25 










30 




Gly 


Glu 


Lys 


Asn 


Glu 


Gin 


Glu 


Leu 


Leu 


Glu 


Leu 


Asp 


Gly 








40 










45 








Gin 


Leu 


Leu 


Ser 


Gly 


He 


Val 


Gin 


Gin 


Gin 


Asn 


Asn 


Leu 








55 










60 






Val 




He 


Glu 


Ala 
70 


Gin 


Gin 


His 


Leu 


Leu 
75 


Gin 


Leu 


Thr 


Trp 
80 


Gin 


Leu 


Gin 


Ala 


Arg 


He 


Leu 


Ala 


Val 


Glu 









35 

Gly Gly Ar 
50 

Leu Arg Al 
65 

Gly He Lys Gin 

85 90 

<210> 14 
<211> 92 
<212> PRT 

<213> Artificial Sequence 

<220> 

<223> Chimera protein encoded by gp41 Clone #4a 

Ser Gly His He Asp Asp 
15 

Leu He Glu Glu Ser Gin 
30 

Leu Glu Leu Asp Gly Asp 
45 

Gin Gin Asn Asn Leu Leu 
60 

Gin Leu Thr Val Trp Gly 
75 80 
i Ala Arg He Leu Ala Val Glu 
85 " 90 

<210> 15 5 
<211> 92 
<212> PRT 

<213> Artificial Sequence 

<220> / w 

<223> Chimera protein encoded by gp41 Clone #4b 

<400> 15 . 

Met Gly His His His His His His His Ser Ser Gly His He Asp Asp 

15 10 15 

Asp Asp Lys Tyr Thr Ser Leu He His Ser Leu He Glu Glu Ser Gin 

20 25 30 

Asn Gin Gin Glu Lys Asn Glu Gin Glu Leu Leu Glu Leu Asp Gly Asp 

35 40 45 

Gly Pro Gin Leu Leu Ser Gly He Val Gin Gin Gin Asn Asn Leu Leu 

50 55 60 

Arg Ala He Glu Ala Gin Gin His Leu Leu Gin Leu Thr Val Trp Gly 



<400> 14 














His 


Ser 


Met Gly 


His 


His 


His 


His 


His 


His 


1 






5 










10 


Asp Asp 


Lys 


Tyr 


Thr 


Ser 


Leu 


He 


His 


Ser 




20 










25 




Asn Gin 


Gin 


Glu 


Lys 


Asn 


Glu 


Gin 


Glu 


Leu 




35 








4 0 






Gly Arg 


Gin 


Leu 


Leu 


Ser 


Gly 


He 


Val 


Gin 


50 










55 








Arg Ala 


He 


Glu 


Ala 


Gin 


Gin 


His 


Leu 


Leu 


65 








70 








Ala 


He Lys 


Gin 


Leu 


Gin 


Ala 


Arg 


He 


Leu 



WO 02/J4W9 



VC T/USO I/4XO40 



6/9 



G5 70 75 

He Lys Gin Leu Gin Ala Arg He Leu Ala Val Glu 
85 90 



00 



<210> 16 
<211> 94 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> Chimera protein encoded by gp41 Clone #4c 
<400> 16 



Met 


Gly His 


His 


His 


His 


His 


His 


His 


Ser 


Ser 


Gly 


His 


He 


Asp 


Asp 


1 






5 










10 










15 




Asp Asp Lys 


Tyr 


Thr 


Ser 


Leu 


He 


His 


Ser 


Leu 


He 


Glu 


Glu 


Ser 


Gin 






20 










25 










30 






Asn 


Gin Gin 


Glu 


Lys 


Asn 


Glu 


Gin 


Glu 


Leu 


Leu 


Glu 


Leu 


Asp 


Gly 


Ser 




35 










40 










45 




Asn 


Asp Gly Arg 


Gin 


Leu 


Leu 


Ser 


Gly 


lie 


Val 


Gin 


Gin 


Gin 


Asn 


Asn 




50 








55 










60 










Leu 


Leu Arg 


Ala 


He 


Glu 


Ala 


Gin 


Gin 


His 


Leu 


Leu 


Gin 


Leu 


Thr 


Val 


65 








70 










75 










80 


Trp 


Gly He 


Lys 


Gin 
85 


Leu 


Gin 


Ala 


Arg 


He 
90 


Leu 


Ala 


Val 


Glu 







<210> 17 
<211> 70 
<212> PRT 
<213> Unknown 



<220> ...... 

<223>- Chimera protein of "Wild" Type HIV-1 

<400> 17 

Tyr Thr Ser Leu He His Ser Leu He Glu Glu Ser Gin Asn Gin Gin 

15 10 15 

Glu Lys Asn Glu Gin Glu Leu Leu Glu Leu Asp Arg Gin Leu Leu Ser 

20 25 ' 30 

Asp lie Val Gin Gin Gin Asn Asn Leu Leu Arg Ala He Glu Ala Gin 

35 40 45 

Gin His Leu Leu Gin Leu Thr Val Trp Gly lie Lys Gin Leu Gin Ala 

50 55 60 

Arg He Leu Ala Val Glu 
65 70 



<210> 18 
<211> 5 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> Linker 



<400> 18 

Gly Gly Gly Gly Arg 
1 "* 5 
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<210> 19 
<211> 4 
<212> PRT 

<213> Artificial Sequence 

<220> 

<223> Linker 

<400> 19 
Gly Asp Gly Arg 
1 

<210> 20 
<211> 4 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> Linker 

•:A00> 20 

: : y Asp Gly Pro 



• : :o> 21 
■ _ n > € 
• prt 

Artificial Sequence 



_ - . _ > Linker- - - ■ 

4 .'-0> 21 
:;;v Ser Asn Asp Gly Arg 

~ 5 



• 1 : : > 4 

:iz> PRT 
•\'13> Artificial Sequence 

.-::c> 

<223> Linker 

<4CC> 22 
GJ y Gly Gly Gly 
l" 

<210> 23 
<211> 64 
<212> DNA 

<213> Artificial Sequence 

<220> 

<223> gp41 Sense oligonucleotide #1 
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<4 00> 23 

tacacaagct tgatccactc tctgatcgaa gaaagccaga accagcagga aaaaaacgaa 60 

<210> 24 
<211> 69 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> gp41 Antisense oligonucleotide #2 
<400> 24 

ccagacagaa gctgacgacc accaccaccg tccagttcta gaagttcctg ttcgtttttt 60 
tcctgctgg - - . ^ 

<210> 25 
<211> 73 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> gp41 Sense oligonucleotide #3 
<400> 25 

ggtcgtcagc ttctgtctgg tatcgttcag cagcagaaca atctgctgcg tgctatcgaa 60 
gctcagcagc ate "* ' ' 73 

<210> 26 
<211> 84 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> gp41 Antisense oligonucleotide #4 
<400> 26 

ttcaacagcc aggatacgag cctgaagctg tttgataccc caaaeggtea gttgcagcag 60 
atgctgctga gcttcgatag cacg 84 

<210> 27 
<211> 19 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> His tag/E. K. Cleavage site 
<400> 27 

Met Gly His His His His His His His Ser Ser Gly His He Asp Asp 

1 5 10 " 15 

Asp Asp Lys 



<210> 28 
<211> 5 
<212> PRT 
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<213> Artificial Sequence 
<220> 

<223> Pentapeptide 
<400> 28 

Trp Arg Trp Arg lie 
1 5 
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